Search results for "cross validation"
showing 9 items of 9 documents
Strategies to develop radiomics and machine learning models for lung cancer stage and histology prediction using small data samples
2021
Abstract Predictive models based on radiomics and machine-learning (ML) need large and annotated datasets for training, often difficult to collect. We designed an operative pipeline for model training to exploit data already available to the scientific community. The aim of this work was to explore the capability of radiomic features in predicting tumor histology and stage in patients with non-small cell lung cancer (NSCLC). We analyzed the radiotherapy planning thoracic CT scans of a proprietary sample of 47 subjects (L-RT) and integrated this dataset with a publicly available set of 130 patients from the MAASTRO NSCLC collection (Lung1). We implemented intra- and inter-sample cross-valida…
Superposing significant interaction rules (SSIR) method: a simple procedure for rapid ranking of congeneric compounds
2020
The Superposing Significant Interaction Rules (SSIR) method is revised and implemented. The method is a simple combinatorial procedure, which deals with in situ generated rules among a dichotomized congeneric molecular family, selecting the most probabilistically relevant ones. The mere counting of the number of relevant rules attached to new compounds generates a molecular ranking useful for database filtering, refinement and prediction. The algorithm only needs for a symbolic molecular representation and this allows for mining the database in a confidential manner. Third parties will not know the real compounds that are on the way to be worked out. The procedure is tested for a complete s…
Prediction of Disease–lncRNA Associations via Machine Learning and Big Data Approaches
2021
This chapter introduces long non-coding RNAs and their role in the occurrence and progress of diseases. The discovery of novel lncRNA-disease associations may provide valuable input to the understanding of disease mechanisms at the lncRNA level, as well as to the detection of biomarkers for disease diagnosis, treatment, prognosis, and prevention. Unfortunately, due to costs and time complexity, the number of possible disease-related lncRNAs verified by traditional biological experiments is very limited. Computational approaches for the prediction of potential disease-lncRNA associations can effectively decrease the time and cost of biological experiments. We first review the main computatio…
HEp-2 Cell Classification with heterogeneous classes-processes based on K-Nearest Neighbours
2014
We present a scheme for the feature extraction and classification of the fluorescence staining patterns of HEp-2 cells in IIF images. We propose a set of complementary processes specific to each class of patterns to search. Our set of processes consists of preprocessing,features extraction and classification. The choice of methods, features and parameters was performed automatically, using the Mean Class Accuracy (MCA) as a figure of merit. We extract a large number (108) of features able to fully characterize the staining pattern of HEp-2 cells. We propose a classification approach based on two steps: the first step follows the one-against-all(OAA) scheme, while the second step follows the…
High Performance Liquid Chromatografy-Mass Spectrometry based chemometric characterization of olive oils
2005
In this study the effective discrimination of extra virgin olive oils is described using HPLC-MS, combined with chemometric evaluation. The presented method is simple since the diluted oil sample is directly injected into the system, without any preliminary chemical derivatization or purification step. Separation of diacylglycerols, triacylglycerols and sterols occurs within 20 min and is achieved using an octadecyl-silica column. Detection is performed by positive APCI mass spectrometry which provided sensitivity to detect over 50 compounds in the sample. After extraction of data, stepwise discriminant function analysis is used to select the variables with the highest discriminative power.…
Fishery-dependent and -independent data lead to consistent estimations of essential habitats
2016
AbstractSpecies mapping is an essential tool for conservation programmes as it provides clear pictures of the distribution of marine resources. However, in fishery ecology, the amount of objective scientific information is limited and data may not always be directly comparable. Information about the distribution of marine species can be derived from two main sources: fishery-independent data (scientific surveys at sea) and fishery-dependent data (collection and sampling by observers in commercial vessels). The aim of this paper is to compare whether these two different sources produce similar, complementary, or different results. We compare them in the specific context of identifying the Es…
KERNEL ESTIMATION OF THE TRANSITION DENSITY IN BIFURCATING MARKOV CHAINS
2023
We study the kernel estimator of the transition density of bifurcating Markov chains. Under some ergodic and regularity properties, we prove that this estimator is consistent and asymptotically normal. Next, in the numerical studies, we propose two data-driven methods to choose the bandwidth parameters. These methods are based on the so-called two bandwidths approach.
A rapid method for the differentiation of yeast cells grown under carbon and nitrogen-limited conditions by means of partial least squares discrimina…
2012
This paper shows the ease of application and usefulness of mid-IR measurements for the investigation of orthogonal cell states on the example of the analysis of Pichia pastoris cells. A rapid method for the discrimination of entire yeast cells grown under carbon and nitrogen-limited conditions based on the direct acquisition of mid-IR spectra and partial least squares discriminant analysis (PLS-DA) is described. The obtained PLS-DA model was extensively validated employing two different validation strategies: (i) statistical validation employing a method based on permutation testing and (ii) external validation splitting the available data into two independent sub-sets. The Variable Importa…
Data from: Fine-scale population dynamics in a marine fish species inferred from dynamic state-space models
2018
Identifying the spatial scale of population structuring is critical for the conservation of natural populations and for drawing accurate ecological inferences. However, population studies often use spatially aggregated data to draw inferences about population trends and drivers, potentially masking ecologically relevant population sub-structure and dynamics. The goals of this study were to investigate how population dynamics models with and without spatial structure affect inferences on population trends and the identification of intrinsic drivers of population dynamics (e.g. density dependence). Specifically, we developed dynamic, age-structured, state-space models to test different hypoth…